LexEQUAL: Supporting Multiscript Matching in Database Systems
نویسندگان
چکیده
To effectively support today’s global economy, database systems need to store and manipulate text data in multiple languages simultaneously. Current database systems do support the storage and management of multilingual data, but are not capable of querying or matching text data across different scripts. As a first step towards addressing this lacuna, we propose here a new query operator called LexEQUAL, which supports multiscript matching of proper names. The operator is implemented by first transforming matches in multiscript text space into matches in the equivalent phoneme space, and then using standard approximate matching techniques to compare these phoneme strings. The algorithm incorporates tunable parameters that impact the phonetic match quality and thereby determine the match performance in the multiscript space. We evaluate the performance of the LexEQUAL operator on a real multiscript names dataset and demonstrate that it is possible to simultaneously achieve good recall and precision by appropriate parameter settings. We also show that the operator run-time can be made extremely efficient by utilizing a combination of q-gram and database indexing techniques. Thus, we show that the LexEQUAL operator can complement the standard lexicographic operators, representing a first step towards achieving complete multilingual functionality in database systems.
منابع مشابه
CPW-Fed Circularly Polarized Slot ANTENNA with Elliptical-Shaped Patch for UWB Applications
A new design of coplanar waveguide (CPW)-fed antenna with circular polarization (CP) and excellent impedance matching is presented. In this design a pair of circular-shaped slits is applied to opposite corners of the slot for enhancing the impedance matching and realizes bandwidth of 134.43% across 2.98-15.20 GHz for VSWR≤2. Furthermore this structure exhibits axial ration bandwidth (ARBW) of 3...
متن کاملMulti-lingual Semantic Matching with OrdPath in Relational Systems
The volume of information in natural languages in electronic format is increasing exponentially. The demographics of users of information management systems are becoming increasingly multilingual. Together these trends create a requirement for information management systems to support processing of information in multiple natural languages seamlessly. Database systems, the backbones of informat...
متن کاملMultiscript - an online student-teacher collaboration platform for classroom lectures
Online collaboration on lecture contents has gained much popularity, over the few decades, due to its potential to enhance the learning experience. We propose a novel idea of an online collaboration platform, called Multiscript (MS), for the students and the teacher, on classroom lectures. MS combines two online learning approaches into a single collaboration platform. One approach, called outs...
متن کاملCentralized Clustering Method To Increase Accuracy In Ontology Matching Systems
Ontology is the main infrastructure of the Semantic Web which provides facilities for integration, searching and sharing of information on the web. Development of ontologies as the basis of semantic web and their heterogeneities have led to the existence of ontology matching. By emerging large-scale ontologies in real domain, the ontology matching systems faced with some problem like memory con...
متن کاملComparative analysis of profit between three dissimilar repairable redundant systems using supporting external device for operation
The importance in promoting, sustaining industries, manufacturing systems and economy through reliability measurement has become an area of interest. The profit of a system may be enhanced using highly reliable structural design of the system or subsystem of higher reliability. On improving the reliability and availability of a system, the production and associated profit will also increase. Re...
متن کامل